Method and apparatus for storage command and data router

ABSTRACT

An interconnecting unit and method for data communication between a plurality of computer hosts and a plurality of storage devices. The interconnecting unit couples the hosts to the storage devices and enables the data communication. The interconnection includes a plurality device control units. Each of the device control unit allows multiple commands to be distributed to multiple storage devices. The communication through the interconnecting unit is strong enough to tolerate failure across one connection path between the hosts and the storage devices.

REFERENCE TO RELATED APPLICATIONS

This application claims the priority of U.S. Provisional PatentApplication No. 60/462,336, entitled “Method And Apparatus For StorageCommand And Data Router”, by Ghaffari, et al., filed Apr. 14, 2003,which is incorporated as reference as of set forth herein in itsentirety.

BACKGROUND

The present invention relates to data storage systems. Specifically, thepresent invention describes a routing device that provides the abilityto concurrently route command and data over a serial storage linkprotocol in a data storage system.

The need for large data storage motivates the building of large-scaleand high-capacity storage systems. A high capacity storage systemrequires a storage controller that enables a host to communicate with alarge number of storage devices. A typical storage controller is aprocessor that receives a command from a host, and translates thecommand into a format, essential for communication with the storagedevices. Currently, this communication is serial in nature. The serialcommunication is enabled by storage serial protocols such as SerialAdvanced Technology Attachment (SATA), or Serially Attached SCSI (SAS),which employ a point-to-point connection. A storage controller that usesSATA or SAS-based protocol to communicate with the storage devices needsa switching or device control unit that interconnects the storagecontroller to the plurality of storage devices.

The communication between the host and the storage devices involves theexchange of command and data. The host sends a command to a storagedevice to perform a read or write operation. The read or write operationis performed after the storage device, to which the command is sent, hasaccepted the command. After the command has been accepted, the data caneither be read out from the storage device, or a new data can be writteninto the storage device. This communication, based on the SATA orSAS-STP protocol, is made in the form of a “Frame InformationStructure”(FIS). The FIS encapsulates the command and data beingexchanged between the host and the storage device. Further details ofFIS are described in serial ATA specification 1.0 described in thearticle “Serial ATA: High Speed Serialized AT Attachment”, Aug. 29,2001.

An interconnecting unit between the host and the storage devices enablesthe above-described communication. Two draft standards, based on SATAprotocol, have been proposed for this interconnecting unit.

The first standard involves a port multiplier (PM) approach, which usesa multiplexer to multiplex an active host connection to up to 15 deviceconnections. This enables utilization of the full bandwidth of the hostconnection. The PM approach requires the modification of the packetstructure in SATA, in particular the FIS to incorporate the PM routingfield. Consequently, either the host or the interconnecting unit has tobe modified to produce PM FIS (FIS specific to the port multiplier).

The second standard involves a routers, switches and multiplexers (RSM)approach. This approach consists of an interconnecting unit that allowscommand and data information to be directed from one or more hosts tomultiple storage devices. The interconnecting unit utilizes a wrapperFIS into which all other FIS types can be encapsulated. The wrapper FISincludes a header that is used to define and activate a connectionbetween route-aware devices. However, this approach requires aSATA-based host or interconnecting unit to support RSM, in order toprovide this connectivity. In addition, all the elements in theSATA-based interconnecting unit have to be RSM route aware and able toprocess the header to forward the encapsulated FIS.

The interconnecting unit, based on any of the above-described twoapproaches can be used to connect a plurality of storage devices to aplurality of hosts. This interconnecting unit contains multiple RSMs ormultiplexers to connect the storage devices and the hosts.

An interconnecting unit is described in US patent application numberUS2004/0024950 titled ‘Method and Apparatus for Enhancing Reliabilityand Scalability of Serial Storage Devices’. This patent applicationdescribes a method and system for data communication between a pluralityof hosts and a plurality of storage devices. The communication isachieved by using a switch architecture-based interconnecting unit.

Another interconnecting unit is described in WIPO Publication No.WO03091887 titled ‘Method and Apparatus for Dual Porting a Single PortSerial at a Disk Drive’. The patent describes a method and system toconnect single port devices to a plurality of hosts. The inventionprovides a switched assembly to selectively connect a storage device toone of the plurality of hosts.

The above-mentioned interconnecting units suffer from one or more of thefollowing limitations. First, the implementation of an interconnectingunit based on the PM or RSM approach requires changes to be made to thearchitecture of hosts, which is expensive to implement. The RSM systemis expensive because it involves host modification. Further, the RSMsystem is not backward compatible with existing solutions, thus it alsoinvolves software changes. On the other hand, the cost of the portmultiplier is dependent on the host design. A host that allows forcommands to be sent to multiple devices can be very expensive.Consequently, the port multiplier is expensive although it has arelatively higher performance. The hosts that allow one command to beactive at a time are not expensive and thus the port multiplier is notexpensive. But, this host/PM solution has lower performance.

Secondly, the interconnecting unit, based on either approach, is verycomplex.

For example, to connect ‘m’ hosts to ‘n’ number of storage devices, m*nmultiplexers are required. As a result, the design complexity of theinterconnecting unit increases substantially. Increased complexityresults in the need for a larger chip area, and, consequently, high costand high power consumption. Thirdly, the interconnecting unit allows asingle command to be executed at a time. Even if the multiple hostsissue multiple commands at the same time, the interconnecting unit isable to execute only one command at a time. Consequently, the throughputof the interconnecting unit is low. Fourthly, the interconnecting unitwill fail to operate where a failure is encountered in the path betweena host and a storage device. This means that the communication is notreliable and robust enough to tolerate faults and thus lowers thestorage device's availability.

In light of the above-described limitations, there is a need for asimple, cost-efficient interconnecting unit that enables a plurality ofstorage controllers to communicate with a large number of seriallyattached storage devices. The interconnecting unit should avoid changesin the architecture of the hosts, thereby reducing cost. Further, theinterconnecting unit should have higher throughput and should allowreliable communication between the hosts and the storage devices. Themechanism should be strong enough to tolerate at least a single point offailure and not limit the availability of the storage devices in case ofa fault.

SUMMARY

The present invention is directed at data storage systems. Specifically,the present invention is directed at a routing device that provides theability to concurrently route command and data over a serial-storagelink protocol in a data storage system.

An aspect of the present disclosure is to provide a system and method toenable connectivity between a plurality of hosts and a plurality ofstorage devices.

Another aspect of the present disclosure is to provide a system suchthat multiple commands can be distributed to the storage devices, thushaving multiple storage devices active concurrently.

Another aspect of the present disclosure is a provision of faultidentification and isolation to improve the time required to repairfailures.

Another aspect of the present disclosure is a provision of loadbalancing between the hosts.

Another aspect of the present disclosure is a provision of dataredundancy for error detection and correction.

Yet another aspect is a provision for data parallelism for increasedperformance.

The above aspects are attained by a data storage system comprising aplurality of hosts, a plurality of storage devices and aninterconnecting unit for coupling the plurality of hosts and theplurality of storage devices. The interconnecting unit comprises aplurality of host interface units, a plurality of device control units,an interconnect routing unit and a plurality of device interface units.The device control unit enables distribution of multiple commands from asingle host to more than one storage device. This is achieved by notrequiring the device control unit to wait for the completion of thesubmitted command before allowing another command to be submitted, andby enabling queuing at the host layer. The device control unit alsomonitors the status of various components, thereby locating any faultthat may occur during the communication between the hosts and thestorage devices. The system and method of the present invention havenumerous advantages over the prior art:

First, the device control unit is simple in design and its complexity isless, in comparison with existing device control units.

Second, the device control unit achieves a higher throughput than theexisting device control units.

Third, the use of a plurality of device control units creates pathredundancy as well as data parallelism.

Fourth, the invention improves the time for recovery in the case offailures.

Fifth, the present invention also allows a non-queue capable device, ora different generation of a storage device to be connected to a queuecapable host.

Finally, the device control unit processes data destined for the storagedevices.

Data destined for the storage devices can be processed to produce errorcorrection code (ECC) to provide data integrity, compression of the datato reduce actual storage, or encryption for data security reasons.

BRIEF DESCRIPTION OF THE DRAWINGS

The preferred embodiments of the invention will hereinafter be describedin conjunction with the appended drawings provided to illustrate and notto limit the invention, wherein like designations denote like elements,and in which:

FIG. 1 illustrates a data storage system, in accordance with anembodiment of the present invention;

FIG. 2 is a block diagram illustrating the essential elements of adevice control unit in accordance with an embodiment of the presentinvention;

FIG. 3 is a block diagram of the target transmit and receive block inaccordance with an embodiment of the present invention; FIG. 4 is ablock diagram of the device selector in accordance with an embodiment ofthe present invention;

FIG. 5 is a block diagram of the initiator transmit and receive block inaccordance with an embodiment of the present invention;

FIG. 6 a is a block diagram of the host interface unit in accordancewith an embodiment of the present invention;

FIG. 6 b is a detailed block diagram of the device interface unit inaccordance with an embodiment of the present invention;

FIGS. 7 a and 7 b show a flowchart illustrating the method of datacommunication between a plurality of hosts and storage devices.

DESCRIPTION OF PREFERRED EMBODIMENTS

For the sake of convenience, the terms used to describe variousembodiments are defined below. It should be understood that these areprovided to merely aid the understanding of the description, and thatdefinitions in no way limit the scope of the invention.

Command—A command refers to a special packet structure such as a FrameInformation Structure (FIS), whose payload contains an instruction. Thecommands are generated by the host and executed by the storage devices.The storage devices generate a status to report the result of theexecution of the command to the host.

Data—Data is the user data that is written on a storage device or readfrom the storage device. A command initiates the read or write operationof data.

Serial I/O structure—A serial I/O structure comprises a command to astorage device, some or no data transfer, and a completion status fromthe storage device. The serial I/O structure contains all the bit datanecessary to achieve a serial communication, referred to as primitives.

I/O Structure—Bit data on a 32 bit boundary that has been extracted fromthe serial I/O structure. This structure may contain a command, userdata or status, and is free of any protocol used in the Serial I/OStructure necessary for communication in serial format, i.e. In the I/Ostructure the primitives of the Serial I/O structure are removed.

The present invention relates to a method and system for data storagebetween a plurality of hosts and a plurality of storage devices by usingan interconnecting unit. In particular, the invention describes a devicecontrol unit that allows a plurality of commands from the hosts to bedistributed concurrently to the plurality of storage devices.

FIG. 1 is a block diagram illustrating a data storage system inaccordance with the present invention. A data storage system 100comprises an interconnecting unit 104 coupling a plurality of hosts 102to a plurality of storage devices 106. The coupling comprises theexchange of serial I/O structures between plurality of hosts 102 andplurality of storage devices 106. Host 102 generates at least onecommand for coupling to at least one of plurality of storage devices106.

The elements of serial I/O structure vary with the serial protocol used.For example, if SATA is used as the serial protocol, the serial I/Ostructure is composed of a command and status FIS, and if there isexchange of data, data FISes and any other FIS necessary for exchange ofa data FIS. If the SAS-SCSI is used as the serial protocol, then theserial I/O structure is sent in the SCSI command structure format. Forthe purpose of illustration, the invention has been described using theFIS format for serial I/O structure although the invention is equallyapplicable to other I/O structures.

Each host 102 is queue capable, i.e., multiple commands can betransmitted without waiting for the completion of the previous commandsin the queue. Interconnecting unit 104 is capable of processing theserial I/O structures for plurality of storage devices 106 and hosts 102and establishing communication between one of storage devices 106 andhost 102 based on a given fairness algorithm. An example of a fairnessalgorithm can be one wherein the order in which interconnecting unit 104processes storage devices 106 is the order in which the commands arereceived from host 102. Another example is that interconnecting unit 104processes the commands in the order in which plurality of storagedevices 106 are ready to process them.

Interconnecting unit 104 comprises a plurality of host interface units107, a plurality of device control units 108, an interconnect routingunit 110 and a plurality of device interface units 112. There is onedevice control unit 108 for each host 102. Each device control unit 108communicates with host 102 and selects storage device 106 to which thereceived command is to be routed. Each device control unit 108 alsocontrols interconnect routing unit 110 to route the serial I/O structureto device interface unit 112. In an embodiment of the present invention,interconnect routing unit 110 comprises a plurality of multiplexers,each multiplexer enabling connection between device control unit 108 anddevice interface unit 112 corresponding to selected storage device 106.There is a device interface unit 112 for each storage device 106. Deviceinterface unit 112 establishes a point-to-point connection with storagedevice 106 until device control unit 108 selects device interface unit112. After the selection, device control unit 108 handles thecommunication with storage device 106.

Please note that interconnect routing unit 110, using a plurality ofmultiplexers, ensures that input to each device control unit 108 can bereceived from any device interface unit 112, and output of each devicecontrol unit can go to any device interface unit 112.

It should be noted that in an embodiment of the invention,interconnecting unit 104 is implemented as a Field Programmable GateArray (FPGA). In an alternate embodiment, interconnecting unit 104 canbe implemented as an Application Specific Integrated Circuit (ASIC).

Host interface unit 107 and device interface unit 112 synchronize theserial data received from their respective connection to the runninginternal clock of device control unit 108 via an elasticity buffer.Synchronization is essential since device control unit 108, hosts 102and storage devices 106 need not operate on the same clock, i.e., theclock rate need not be the same for device control unit 108, hosts 102and storage devices 106. If the clock rates are the same, there is stilla need for an elasticity buffer due to differences between the extractedclock from the physical interface (connecting a host 102 andinterconnecting unit 104 as well as a storage device 106 andinterconnecting unit 104) and the running internal clock of devicecontrol unit 108. Additionally, host interface unit 107 and deviceinterface unit 112 extract the bits from the serial bit stream directedfrom host 102 and storage device 106 respectively. The bits areextracted in a character bit format such as a 10-bit character format.

Device control unit 108 directs multiple commands from hosts 102 to morethan one storage devices 106. Therefore, interconnecting unit 104enables coupling between plurality of hosts 102 and plurality of storagedevices 106.

The point-to-point connection between each host 102 and interconnectingunit 104 is over a single serial link using host interface unit 107.Similarly, the point-to-connection between storage devices 106 andinterconnecting unit 104, through device interface unit 112, is over asingle serial storage link. The serial storage link can be implementedusing protocols such as SATA or SAS. The SATA and SAS protocols are wellknown in the art, and their applications in the present invention shouldbe apparent to one skilled in the art.

FIG. 2 illustrates the elements of device control unit 108 in accordancewith an embodiment of the present invention. Device control unit 108comprises a target transmit and receive block 204, a device selector206, and an initiator transmit and receive block 208. Further, devicecontrol unit 108 comprises an error register 210 and a plurality of dataprocessing logic unit 212.

Target transmit and receive block 204 monitors the processedcommand/data (in the form of serial I/O structure) from host 102 in the10 bit character format. In an embodiment that uses SATA serial link,the serial I/O structure is sent in the form of FIS. Target transmit andreceive block 204 converts the received FIS, in the character format,into a fixed data format. In an embodiment of the invention, Targettransmit and receive 204 converts the 10-bit character format into a32-bit size data format. The elements of target transmit and receiveblock 204 will be further described by means of FIG. 3.

Device selector 206 monitors the received element of the serial I/Ostructure from target transmit and receive block 204, to identify thepresence of a command. In case the received element of the serial I/Ostructure is a command, one of storage devices 106, to which the commandis directed, is selected. This selection is based on a tag for thecommand that stores the identity of storage device 106 to which thecommand is directed. After this selection, the command is modifiedaccording to the type of storage device 106 that is selected. Forexample, a queued read or write command from host 102 destined to anon-queue capable storage device 106 is modified to relocate the tag andthe command opcode before it is forwarded to initiator transmit andreceive block 208.

The location of the tag, identifying the selected storage device, in thecommand depends on storage device 106 to which the command is beingdirected. In an embodiment of the invention, for SATA commands directedto non-queue capable storage device 106, higher bits of the address areused as the tag to represent storage device 106. In case the command isdirected to queue capable storage device 106, the location of the tag isas per the SATA standard. Please note that the location of the tag aspart of the address is design specific. It should be apparent to aperson skilled in the art that there are other methods of tagging, suchas using an unused field of the command. Alternatively, non-queued SATAcommands directed to queue capable or non-queue capable storage devices106 can be wrapped into a queue command received by interconnecting unit104, which in turn unwraps the command into a regular SATA command tostorage device 106. This option allows some hosts that are not otherwisecapable of interleaving non-queued commands, to queue them as if theywere queue capable. It should be apparent to a person skilled in the artthat the method of wrapping of non-queue command is implementationspecific and does not limit the scope of the invention.

The command is subsequently forwarded to selected storage device 106through initiator transmit and receive block 208. Initiator transmit andreceive block 208 is responsible for delivering the command to selectedstorage device 106. The elements of initiator transmit and receive block208 are further described using FIG. 4. Once the command is forwarded toselected storage device 106, target transmit and receive block 204starts monitoring hosts 102 for more commands and initiator transmit andreceive block 208 monitors storage devices 106 for a response.

In the above discussion we are referring in detail to the reception of acommand. It is worth mentioning that the other serial I/O structuresreceived or transmitted are discussed when we discuss the operationflow.

The command forwarded to selected storage device 106 may or may not beaccepted by it. Selected storage device 106 may not accept the commandif it detected a CRC error or a bit error while receiving the command.In such a scenario, host 102 tries the command again. Also, some storagedevices may not accept more than one command at a time. This knowledgeis kept in host 102 and device selector 206 and multiple commands arenot issued to that storage device. If host 102 initiates a command to astorage device that is busy, device control unit 108 does not accept thecommand and host 102 tries again until the busy storage device is nolonger busy.

Device control unit 108 also includes an error register 210 that storesthe status of various connections between host 102, device control unit108, and storage devices 106. This can be used to locate an error, ifany, encountered while the coupling is established between hosts 102 andstorage devices 106. Errors tracked by error register 210 include biterrors and Cyclic Redundancy Code (CRC) errors and the port at whichthey occurred. Please note that there is a unique error register 210connected to each device control unit 108. For example, if there is aplurality of device control units 108, each device control unit isconnected to error register 210.

In an embodiment of the present invention, each device control unit 108also includes a data processing logic unit 212 to process data before itis stored in storage devices 106. In an embodiment of the invention,data processing logic unit 212 is connected between target transmit andreceive block 204 and initiator transmit and receive block 208. Theprocessing is done to produce error correction codes, which provide dataintegrity, compression of data to reduce actual storage and encryptionfor data security purposes.

The elements of target transmit and receive block 204 are described,hereinafter, using FIG. 3. Target transmit and receive block 202comprises a link unit 301, a target transport transmit unit 302 and atarget transport receive unit 304.

Link unit 301, defined as per the SATA standard definition, isresponsible for the following operations for the received data: converta 10-bit character to 8-bit data, remove primitives, unscramble thedata, check CRC, and pack four bytes into 32-bit data with correctcharacter boundaries. Link unit 301 is responsible for the followingoperations on the transmit data: calculate CRC for the transmit data,scramble the data, convert 32 bit data to four 10-bit characters, addthe appropriate primitives and pack the four 10-bit characters oncorrect boundaries for transmission. The link 301 receives an element ofthe serial I/O structure from host 102 and converts to 32 bit data fortarget transport receive unit 304. In an embodiment of the presentinvention, link unit 301 is implemented using a finite state machine onan integrated circuit. The design and logical interconnections of thefinite state machine to achieve the above-described functionalities oflink unit 301 will be apparent to a person skilled in the art. Targettransport receive unit 304 receives an element of the serial I/Ostructure from link 301. Further, it has an adequate buffer to hold datareceived from host 102 in case storage device 106 is not ready. Targettransport receive unit 304 interprets the incoming serial I/O structureand signals device selector 206 as to what element of the serial I/Ostructure is being exchanged. It further communicates with the dataprocessing unit 212 via a finite state machine to pass the received I/Ostructure for any further processing. Depending on the type of the I/Ostructure, data processing unit 212 may pass the I/O structure directlyto initiator transmit unit 502 or further process it prior to passingit. Target transport transmit unit 302 forwards the request receivedfrom any of storage devices 106 to host 102, allowing link unit 301 toadd the link functions to the data. It further communicates with dataprocessing unit 212 via a finite state machine to receive the receivedI/O structure from initiator receive unit 504. Depending on the type ofthe I/O structure, data processing engine 212 may pass the I/O structuredirectly to target transport transmit unit 302 or further process itprior to passing it.

The elements of device selector 206 are described, hereinafter, usingFIG. 4. Device selector 206 comprises a command queue block 404, a readselect logic unit 406, a write select logic unit 408 and a flow selectlogic 410.

Command queue block 404 stores a signature of a pending command whilethe command FIS is delivered to storage device 106. The pending commandis stored if either selected storage device 106 has not yet responded tothe command or other commands are being processed. Command queue block404 is checked to identify if there is a command in the queue forexecution.

Read and write select logic units 406 and 408 select storage device 106to which the I/O structure is directed. Write select logic 408 selects arouting path from host 102 through interconnect routing unit 110 toselected storage device 106. Read select logic 406 selects a routingpath from selected storage device 106 through interconnect routing unit110 to host 102. Read select logic unit 406 and write select logic unit408 generate the control signals for the multiplexer used ininterconnect routing unit 110 to enable the routing path. Read selectlogic unit 406 and write select logic unit 408 ensures that the commandsof the I/O structure can be delivered while storage devices 106 aremonitored for communication.

Flow select logic unit 410 arbitrates the order of processing of thedevice requests as well as negotiations with host 102 in case ofnon-queue capable storage devices 106 and non-queue commands. Flowselect logic unit 410 receives signals from target transport receiveblock 304 and initiator transmit and receive block 208 as to a serialI/O structure element that is being exchanged. Flow select logic 410 isimplemented by using a finite state machine whose operational flow isdefined in FIG. 7.

Please note that the identification of queue capability of storagedevices is performed when storage devices 106 are initially powered upand identified by the host. Subsequently, the knowledge is passed on tointerconnecting unit 104. In case a storage device 106 is queue enabled,multiple commands can be delivered by device control unit 108 to astorage device. Queue-capable storage device 106 completes the I/Ostructure by executing the command and interconnecting unit 104 merelyacts as a router of the FIS between host 102 and storage device 106.

If storage device 106 is non-queue capable, interconnecting unit 104modifies the tag and the command opcode and forwards the modifiedcommand FIS to non-queue capable storage device 106. When non-queuecapable storage device 106 is ready to communicate with host 102, flowselect logic unit 410 negotiates with host 102 before coupling storagedevice 106 and host 102 for communication. In particular, if host 102sends a command to a non-queue capable storage device 106 that haspreviously received a command, interconnecting unit 104 retries thatcommand until storage device 106 is no longer busy.

The elements of initiator transmit and receive block 208 is described,hereinafter, using FIG. 5. Initiator transmit and receive block 208comprises a pair of links 501 a and 501 b, an initiator transmit unit502 and an initiator receive unit 504.

Links 501 a and 501 b are used to encode and decode storage protocolprimitives. Link 501 a is connected to initiator transmit unit 502 andinterconnect routing unit 110. Link 501 b is connected to initiatorreceive unit 504 and interconnect routing unit 110. Links 501 a and 501b have functionality similar to link unit 301. Additionally, link 501 ahas special states that resolve the collision of a serial I/O structurereceived from host 102 with that of one received from storage device106. This case of a deadly embrace is resolved in the following way.

If host 102 is transmitting a software reset instruction to storagedevice 106, serial I/O structure received from storage device 106 isaborted by interconnecting unit 104, through link 501 a, and thesoftware reset instruction is forwarded to storage device 106. If thetransmitted serial I/O structure is other than software resetinstruction to storage device 106, then interconnecting unit 104 abortsboth operations and allows the host software to resolve the anomaly.

In an embodiment of the present invention, links 501 a and 501 b areimplemented using a finite state machine on an integrated circuit. Thedesign of the finite state machine to achieve the above-describedfunctionalities of link unit 301 will be apparent to a person skilled inthe art.

Initiator transmit unit 502 accepts the I/O structure, which wasreceived by target transport receive unit 304, by communicating via afinite state machine. The I/O structure may be further modified by dataprocessing unit 212 or passed through directly. If selected by writeselect logic 406, storage device 106 receives the I/O structure frominitiator transmit unit 502 and signals the successful receipt of theI/O structure. This receipt is forwarded back through initiator transmitunit 502 and target receive unit 304 to host 102. At this point, an I/Ostructure is delivered to storage device 106 and if the I/O structurewas a command, device select unit 206 has a signature of the pendingcommand in its command queue 404. Flow select logic unit 410 instructsinitiator receive unit 504 to monitor storage devices 106 for activity.The activity comprises a signal from selected storage device 106indicating that it is ready to communicate. Initiator receive unit 504interprets the incoming serial I/O structure (corresponding to anactivity signal) and signals flow select logic unit 410 in deviceselector 206 as what serial I/O structure element is being exchanged.Initiator receive unit 504 further communicates with data processingunit 212 via a finite state machine to pass the received I/O structurefor any further processing. Depending on the type of the I/O structure,data processing unit 212 may pass the I/O structure directly to targettransmit 302 or further process it prior to passing it. In this manner,other serial I/O structures are exchanged between host 102 andinterconnecting unit 104, as well as storage device 106 and host 102until the command is executed to completion, i.e., either of the read orthe write operation is performed to completion. The flow of the data isfurther described in FIG. 7.

The present invention enables execution of multiple commands at the sametime. In particular, multiple commands can be delivered to storagedevice 106, which executes them one-by-one, in any order. Flow selectlogic unit 410 can transmit a command to the same (selected) or adifferent storage device while initiator receive unit 504 is monitoringall storage devices 106 for activity. Further, device selector 206tracks the delivery of the commands, and keeps track of them until itsqueue is empty.

In this way, more than one storage device 106 can be active at the sametime. Specifically, multiple commands can be delivered to multiplestorage devices 106. Each of these active storage devices 106 is thenserviced according to the pre-specified fairness algorithm.

Host interface unit 107 is, hereinafter, described using FIG. 6 a. Hostinterface unit 107 comprises a target physical interface 602, and acontroller connection 604 a. Target physical interface 602 connects host102 to device control unit 108 in accordance with point-to-point storageserial protocol such as SATA. Controller connection 604 a synchronizesthe serial data received from host 102 to the running internal clock viathe elasticity buffer.

Similarly, the connection between storage device 106 and device controlunit 108 is established using device interface unit 112. The elements ofdevice interface unit 112 are described, hereinafter, using FIG. 6 b.Device interface unit 112 comprises a controller connection 604 b, alink control unit 608 and an initiator physical interface 610.

Controller connection 604 b synchronizes the serial data received fromstorage device 106 to the running internal clock via the elasticitybuffer. Link control unit 608 generates primitives when either ofinitiator transmit unit 502 or initiator receive units 504 is notconnected to device control unit 108. Please note that a given storagedevice 106 is not connected to the initiator receive and transmit unit208 at all times. However, serial storage devices need to actively sendand receive valid primitives. The valid primitives are generated by linkcontrol unit 608 while initiator transmit unit 502 or initiator receiveunit 504 are not in communication with a particular storage device.

The connection between interconnecting unit 104 and storage device 106is established over initiator physical interface 610. Initiator physicalinterface 610 is enabled for SATA/SAS-STP protocol. Please note that theimplementation of target physical interface 602 and initiator physicalinterface 610, in accordance with serial storage protocol such as SATA,is known in the art and it should be apparent to a person skilled in theart.

The above-described working of the invention is further explained usinga flowchart in FIG. 7 a and FIG. 7 b. In step 702, target receive block304 receives a command from host 102. In response to this, targettransport receive block 304 notifies device selector 206 and alsoforwards the command to initiator transmit unit 502. Based on thereceived tag, device selector 206 selects a storage device 106 in step704. Further, device selector 206 selects a proper routing throughinterconnect routing unit 110 to create a connection between initiatortransmit unit 502 and device interface unit 112. Once the connection iscreated, initiator transmit unit 502 delivers the command to selectedstorage device 106 in step 706. Device interface unit 112 serializes thedata to be transmitted to selected storage device 106. Initiatortransmit unit 502 and target transport receive unit 304 remain engagedtill an acknowledgement of proper receipt of the command is receivedfrom storage device 106. Subsequently, initiator transmit unit 502,device selector 206 and target transport receive unit 304 each, in turn,forward the receipt until it is received by host 102.

Meanwhile, in step 708, initiator receive unit 504 monitors plurality ofstorage devices 106 for a response. The order in which plurality ofstorage devices 106 are monitored is the same as the order in which thecommands are delivered. Alternatively, a storage device that responds,thereby indicating that it is ready for communication, is served first.If a response is detected, the subsequent steps are determined by thenature of storage device, i.e., whether responding storage device 106 isqueue or non-queue capable. At the same time, in step 710, targettransport receive unit 304 monitors host 102 for another command.

In step 711, when responding storage device 106 is selected throughdevice selector 206, it is determined if responding storage device 106is queue capable or not.

If responding storage device 106 is queue capable, then in step 712,read select logic 406 routes the response from responding storage device106 to initiator receive unit 504, which then forwards the response tohost 102, through target transport transmit unit 302. There are severalserial I/O structures that are related to negotiation between host 102and responding storage device 106 as to which tag is being processed. Inparticular, if responding storage device 106 is not queue capable, thenin step 714, flow select logic 410 negotiates with host 102. Thenegotiation is done to ensure that a tag with which responding storagedevice 106 was initially selected by host 102 is presented back to host102 during the negotiations.

Subsequently, in step 716, flow control logic 410 locks host 102 andresponding storage device 106. Locking is done to ensure that host 102and responding storage device 106 exchange serial I/O structures untilthe user data is exchanged to completion. In step 718, it is determinedif the operation is a read operation or a write operation. If theoperation is a read operation, responding storage device 106 sends readdata to host 102, at step 720, in a single or multiple serial I/Ostructures followed by a completion status at step 724. If the operationis a write operation, storage device 106 sends a DMA activated serialI/O structure, at step 722, to enable host 102 to send the write datafor every serial I/O structure that needs to be transmitted from host102 to responding storage device 106. Once host 102 receives the DMAactivated serial I/O structure, it sends the write data to respondingstorage device 106. Responding storage device then sends a completionstatus to host 102 at step 724. Subsequently, at step 726, flow selectlogic unit 410 monitors to see if there is any command in the queue. Itthen samples storage devices 106 with the command in queue to see ifthey are ready for communication. If there is any storage device 106ready for communication, flow select logic unit moves to step 711.

Please note that the above-described negotiation related to DMAactivation is specific to SATA I protocol and has been described forillustration purpose only. It will be apparent to a person skilled inthe art that negotiations performed using SATA II and other serialinterfaces are different. The serial I/O structures used for DMAactivation or completion status vary according to the serial protocolused.

To summarize, for every serial I/O structure that is from host 102 tostorage device 106, the FIS travels from host interface unit 107, targettransport receive unit 304 to write select logic 408 that selects arouting path through interconnect routing unit 110 to a storage device106, The routing is through initiator transmit unit 502, interconnectrouting 110, and device interface unit 112. For every serial I/Ostructure that is from storage device 106 to host 102, the serial I/Ostructure travels device interface unit 112, read select logic 406 thatselects the proper routing through interconnect routing unit 110. Therouting path is through initiator receive unit 504, target transporttransmit unit 302 and finally to host interface unit 107.

The present invention has been described using SATA for illustrationpurposes only. It will be apparent to a person skilled in the art thatany other serial link protocol such as SAS, SCSI can be used to work theinvention without diverting from the scope of the invention. Further,the structure of the command and data exchanged (as serial I/Ostructure) between hosts 102 and storage devices 106 would depend on theserial link protocol used. Further, it will be apparent to a personskilled in the art that the serial I/O structures that need to beexchanged during negotiations between host 102 and storage device 106would depend on the serial link protocol used.

In an embodiment of the present invention, each of target transporttransmit unit 302, target transport receive unit 304, device selector206, initiator transmit unit 502, and initiator receive unit 504 isimplemented as a finite state machine on an FPGA. For example, in anembodiment of the invention, target transport transmit unit 302 isimplemented as a finite state machine that receives a 32-bit serial I/Ostructure as the input. Similarly, device selector 206 can beimplemented as a finite state machine that performs the above-describedfunctions of device selector 206.

The above-described system and method have various advantages.

First, device control unit 108 is simple in design and its complexity islesser, in comparison with existing device control units. In particular,device control unit 108 uses a single target transmit and receive block204 and Initiator transmit and receive block 208 per host rather thanone target transmit and receive block 204 per host and one initiatortransmit and receive block 208 per storage device as in the case ofexisting systems. Consequently, the number of gates required forimplementing device control unit 108 is substantially lower than thatfor the existing systems.

For example, consider the case when the gate counts for target transmitand receive block is x and the gate count for initiator transmit andreceive block is y. The system described in the present invention wouldhave (x+y)*(number of hosts) for gate count. On the other hand, priorart systems have (x+(number of devices*y))*(number of hosts) for gatecount, which is much higher.

Second, device control unit 108 achieves a higher throughput than thethroughput with the existing device control unit. In particular,distribution of multiple commands to multiple storage devices allowsconcurrent processing across plurality of storage devices 106. This isadvantageous because storage devices 106 operate at a data rate, whichis lesser than hosts 102. Consequently, each of hosts 102 can overlapmultiple commands across plurality of storage devices 106 at the sametime.

Third, the use of plurality of device control units 108 creates pathredundancy as well as data parallelism. This would allow two, or ingeneral, multiple hosts, to concurrently communicate with as manystorage devices, as there are host, in parallel. Device selector 206 ineach device control unit 108 allows more than one host to access each ofstorage devices 106. Consequently, even if one of the communicationpaths failure is detected by software, the communication can still beperformed. The multiple communication paths also create parallelism andenable load balancing of the workload.

Fourth, the invention improves the time for recovery in the case offailures. This is achieved by using error register 210 to isolate thefault location. This fault isolation scheme ensures that the fault isdetected, located and quick recovery is affected. Several types of faultrelated to serial interfaces such as CRC error, bit error, loss ofsynchronization, and more are detected. Some are retried by hardware andsome cause a failure in the I/O, which requires software to re-issue theserial I/O structure.

Fifth, the present invention also allows a non-queue capable device, ora different generation of a storage device to be connected to a queuecapable host. The negotiation is handled by interconnecting unit 104 andtransparent to host 102. This allows for cheap drives to be connected toan expensive host

Sixth, device control unit processes data destined for storage devices106. Data destined for storage devices 106 can be processed to produceerror correction code (ECC) to provide data integrity, compression ofthe data to reduce actual storage, or encryption for data securityreasons.

While the preferred embodiments of the invention have been illustratedand described, it will be clear that the invention is not limited tothese embodiments only. Numerous modifications, changes, variations,substitutions and equivalents will be apparent to those skilled in theart, without departing from the spirit and scope of the invention, asdescribed in the claims.

1. An interconnecting unit for coupling a plurality of hosts to aplurality of storage devices, the coupling involving exchange of aplurality of serial I/O structures between the plurality of hosts andthe plurality of storage devices, the interconnecting unit comprising:a. a plurality of device control units enabling distribution of aplurality of commands to the plurality of storage devices, each of theplurality of commands being an element of a serial I/O structure; b. aplurality of host interface units, each of the plurality of hostinterface units synchronizing data between a host and theinterconnecting unit; c. a plurality of device interface units, each ofthe plurality of device interface units synchronizing data between astorage device and the interconnecting unit; and d. an interconnectrouting unit situated between a device control unit and a deviceinterface unit, wherein the device control unit is configured to selecta routing path out of possible routing paths through the interconnectrouting unit between the device control unit and the device interfaceunit upon receiving a command from a host to create a connection betweenthe host and a storage device to allow processing of the command, whichis an element of a serial I/O structure.
 2. The interconnecting unitaccording to claim 1, wherein each of the plurality of the elements of aserial I/O structure is transmitted in a serial bit format.
 3. Theinterconnecting unit according to claim 1, wherein each of the pluralityof host interface units converts each of the plurality of the receivedelements of the serial I/O structure in a serial bit stream format to acharacter bit format.
 4. The interconnecting unit according to claim 1,wherein each of the plurality of device interface units converts each ofthe plurality of the received elements of the serial I/O structure in aserial bit stream format to a character bit format. 5-18. (canceled) 19.The interconnecting unit according to claim 1, wherein the routing pathincludes a read routing path from the host to storage device through theinterconnect routing unit and the write routing path from the storagedevice to the host through the interconnect routing unit, the readrouting path and the write routing path being selected by the devicecontrol unit.
 20. The interconnecting unit according to claim 19 whereinthe device control unit comprises a read select unit configured toselect the read routing path and a write select routing unit configuredto select the write routing path.
 21. The interconnecting unit accordingto claim 20, wherein the plurality of device control units each includea single read select unit and a single write select unit for a singlehost.
 22. The interconnecting unit according to claim 1, wherein thedevice control unit serializes the data for sending to the interconnectunit over a serial link.
 23. The interconnecting unit according to claim1, wherein the device interface unit serializes the data for the storagedevice over a serial link
 24. A method for coupling a host to a storagedevice, the method comprising: receiving, at a device control unit, acommand from a host, the command being an element of a serial I/Ostructure; determining a storage device in which to route the command;selecting, at the device control unit, a routing path out of possiblerouting paths through an interconnect routing unit situated between thedevice control unit and a device interface unit upon receiving thecommand from the host to create a connection between the host and thestorage device to allow processing of the command; routing the commandto the device interface unit through the routing path; serializing datafor the command at the device interface unit; and sending the serializeddata for the command to the storage device.
 25. The method of claim 24,further comprising: monitoring the storage device for a response;determining if the storage device is queue capable; and routing theresponse to the host if the storage device is queue capable.
 26. Themethod of claim 25, further comprising locking the host and the storagedevice to allow exchanging of serial I/O structures for the command. 27.The method of claim 25, further comprising negotiating with the host toindicate which storage device is going to be used.
 28. The method ofclaim 24, further comprising: serializing data for the command from thehost; and sending the serialized data for the command to theinterconnect unit from the device control unit.
 29. An apparatuscomprising: one or more processors; and logic encoded in one or moretangible media for execution by the one or more processors and whenexecuted operable to: receive, at a device control unit, a command froma host, the command being an element of a serial I/O structure;determine a storage device in which to route the command; select, at thedevice control unit, a routing path out of possible routing pathsthrough an interconnect routing unit situated between the device controlunit and a device interface unit upon receiving the command from thehost to create a connection between the host and the storage device toallow processing of the command; route the command to the deviceinterface unit through the routing path; serialize data for the commandat the device interface unit; and send the serialized data for thecommand to the storage device.
 30. The apparatus of claim 29, whereinthe logic when executed is further operable to: monitor the storagedevice for a response; determine if the storage device is queue capable;and route the response to the host if the storage device is queuecapable.
 31. The apparatus of claim 30, wherein the logic when executedis further operable to lock the host and the storage device to allowexchanging of serial I/O structures for the command.
 32. The apparatusof claim 30, wherein the logic when executed is further operable tonegotiate with the host to indicate which storage device is going to beused.
 33. The apparatus of claim 29, wherein the logic when executed isfurther operable to: serialize data for the command from the host; andsend the serialized data for the command to the interconnect unit fromthe device control unit.